Apparatus and method for conversion of structured data between different formats

ABSTRACT

Apparatus and method for performing hierarchial type mask encoding and data transformation includes locating a data source, loading the data source into a temporary storage, encoding hierarchical heuristics for buffer transformation, and delivery of data for transformation and scalar type encoding, reduction, compression, iteration, type extension and versioning if required.

This application is a continuation of application Ser. No. 08/500,420filed on Jul. 10, 1995, which was abandoned upon the filing hereof.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to information handling systems, and moreparticularly to information handling systems in which data exist in anumber of different formats which must be transformed for use in theinformation handling system.

2. Prior Art

U.S. Pat. No. 4,849,810, entitled "Hierarchial Encoding Method andApparatus for Efficiently Communicating Image Sequences", teaches amethod and apparatus for encoding interframe error data in an imagetransmission system, and in particular in a motion compensated imagetransmission system for transmitting a sequence of image frames from atransmitter to a receiver, and play hierarchial vector quantitizationand arithmetic coding to increase the data compression of the imagesbeing transmitted.

The patent deals only with binary images and successive images and isused primarily for reducing error proximtions and playing back images.The patent does not teach a hierarchy that is data type oriented, doesnot include reduction, iteration, nor type compression, nor does itteach versioning or type extension.

U.S. Pat. No. 5,155,594, entitled "Hierarchial Encoding Method andApparatus Employing Background References for Efficiently CommunicatingImage Sequences", teaches a method and apparatus for transmitting asequence of image frames by encoding interframe error data. The patentteaches the steps of compiling a spacially decomposed image of abackground of the sequence of image frames, spacially decomposing awarped image of a previous frame, and spacially decomposing a new inputimage. The spacially decomposed input image is compared with thespacially decomposed background image and the spacially decomposedwarped image. An error signal defining the spacially decomposed inputimage is generated based on these comparisons. The '594 patent is anextension of the '810 patent discussed above. The limitations of the'810 patent apply equally tot he '594 patent.

U.S. Pat. No. 4,858,017, entitled "System and Method for HierarchialImage Encoding and Decoding", teaches hierarchial encoding of an imageby deriving a picture element array of binary coded intensity valuesrepresenting the image and then deriving from the intensity values ofthe picture element array a sequence hierarchial codes consisting of oneof the intensity values as the first code followed by n-1 cyclicdifferences between n-1 distinct pairs of intensity values, where n isthe total number of intensity values of the picture element array.

The patent deals with image compression and Huffman encoding and doesnot include data type oriented hierarchy, nor does it include reduction,iteration or type compression.

U.S. Pat. No. 5,297,219, entitled "Transforms for Digital Images in anHierarchial Environment", teaches an image processing system whichtransforms higher resolution images to lower resolution images.

Byte oriented data hierarchy only captures redundant resolution. Thehierarchy is not data type oriented and does not include reduction,iteration or type compression and further does not provide versioning ortype extensions.

U.S. Pat. No. 5,150,209, entitled "Hierarchial Entropy Coded LatticeThreshold Quantitization Encoding Method and Apparatus for Image andVideo Compression", teaches method and apparatus for encoding interframeerror data in an image transmission system, and in particularly in amotion compensated image transmission system for transmitting a sequenceof image frames from a transformer to a receiver, including hierarchialentropy coded lattice threshold quantitization to increase the datacompression of the images being transmitted.

As above, the '209 patent only deals with image compression and does notinclude a data type oriented hierarchy, nor does it include reduction,iteration or type compression.

U.S. Pat. No. 5,148,272, entitled "Apparatus for Recombining PrioritizedVideo Data", decodes a high definition television signal conveyed ashigh and low priority data in high and low priority channels,respectively, wherein the high and low priority data originated fromblocks of hierarchial encoded compressed video data with data of greaterimportance for image reproduction of each block allocated to the highpriority channel and the remaining data from each block allocated to thelow priority channel.

Although the '272 patent deals with encoding hierarchically encodeddata, it does not include reduction, iteration, type compression, nordoes it provide for versioning or type extension.

U.S. Pat. No. 5,121,448, entitled "Method and Apparatus for High SpeedEditing of Progressively Encoded Images", teaches an image editingmethod and apparatus wherein low resolution image data amonghierarchically encoded image data are decoded. The decoded lowresolution image data are subjected to editing, editing datarepresentative of the editing are stored, the hierarchically encodedimage data are decoded to obtain original image data, and the decodedoriginal image data are subjected to the editing in accordance with thestored editing data.

The patent deals with replaying of an image and does not include ahierarchy which is data type oriented, nor does it include reduction,iteration or type compression nor versioning or type extension

U.S. Pat. No. 5,153,749, entitled "Image Encoding Apparatus", teaches animage encoding apparatus which includes a conversion circuit forconverting binary images into multivalue image data.

As with some of the other prior art patents discussed above, the '749patent deals with bit level binary image data and does not deal with ahierarchy which is data type oriented, nor does it include reduction,iteration, type compression, versioning or type extension.

U.S. Pat. No. 5,315,655, entitled "Method and Apparatus for EncodingData Objects on a Computer System", teaches a method and apparatus forreal time encoding and decoding of data on a computer system. Thepatented apparatus or method is used with a utility which causes dataobjects to be encoded and decoded. The utilities include datacompression utilities, data encryption utilities, and securityutilities. The patented method involves the steps of opening encodeddata object, starting operation of an encoding/decoding apparatus,encoding a decoded data object from a list of decoded data objects,removing the decoded data object from the list of decoded data objects,decoding the encoded data object, posting the encoded data object to alist of decoded data objects, and invoking an application associatedwith the data object just decoded.

The patent refers to a document architecture hierarchy. However, thepatent does not provide reduction, compression, iteration, typeextension, and versioning.

The prior art described above does not provide for a hierarchial typemask encoding and transformation for data and instructions in mixedformats.

SUMMARY OF THE INVENTION

Therefore, it is an object of the present invention to efficientlyconvert data and instructions in mixed formats with a mechanism fortransformation and encoding which supports scalable types, versioning,dynamic type encoding, reduction, compression and iteration whilemaintaining minimum storage requirements and maximum system performance.

Accordingly, apparatus and method for performing hierarchial type maskencoding and data transformation includes locating a data source,loading the data source into a temporary storage, encoding heuristicsfor buffer transformation, and delivery of data for transformation andscalar type encoding, reduction, compression, iteration, type extensionand versioning if required.

The method and apparatus according to the present invention is verypowerful and general and allows data in many varied formats to beeffectively encoded and transformed and used in a single data processingsystem.

The foregoing has outlined rather broadly the features and technicaladvantages of the present invention in order that the detaileddescription of the invention that follows may be better understood.Additional features and advantages of the invention will be describedhereinafter which form the subject of the claims of the invention.

BRIEF DESCRIPTION OF THE DRAWING

For a more complete understanding of the present invention, and theadvantages thereof, reference is now made to the following descriptionstaken in conjunction with the accompanying drawings, in which:

FIG. 1 is a block diagram of a data processing system implementing thepresent invention.

FIG. 2 is a diagram of system memory layout including a mechanism forexecuting the method according to the present invention.

FIG. 3 is a block diagram of the versioning process in accordance withthe present invention.

FIG. 4 is a block diagram of the components of the method according tothe present invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION

A family of computers based on the PowerPC allows for single and mixedEndian modes of operation. (See Gillig, How to Create Endian NeutralSoftware for Portability, IBM Technical Report 54.837.) Sources of dataand programs in operating systems, firmware, residual data, non-volatileRAM ("NVRAM") and applications can be in single and mixed Endianformats. The present invention as embodied herein provides transforms ofscalar and aggregate data types between the various sources. The presentinvention as embodied herein is general: the same mechanism for residualdata and NVRAM to operating systems conversions can be used betweenoperating systems and cross address space communication. The presentinvention as embodied herein is flexible: repetitive and iterative dataaggregates are handled. The present invention as embodied herein isefficient: short paths to iteration and repetition reduce unnecessarytime and resources in the conversion. The present invention as embodiedherein is extensible: versioning is dynamic type encoding is possible.

The present invention as embodied herein executes independent ofaddressing modes, is shared in the microkernel services, sharedservices, operating systems, embedded systems, and other non-PowerPCarchitectures.

Overview

Two primary sources of data exists in PowerPC: hardware and software.

Hardware data sources include:

1. NVRAM

2. ROM

3. Residual Data

4. Firmware in the form of Run-time and Open Firmware programminginterfaces

5. System Memory

Software data sources include:

1. Operating system components and applications

2. Local and remote messaging

3. File and date formats

In mixed Endian programming environments, transformations of mixed dataformats is more common. Transformation steps include locating datasource, loading data source into a buffer, encoding heuristics forbuffer transformation and delivery of data source. Specificallyillustrated in this document are the encoding and transformation.Location and delivery mechanism are implementation or domain dependent.

Referring now to FIG. 1, a data processing system 100 embodying thepresent invention will be described. Processor 102 which preferably is aPowerPC processor is connected to a storage bus 103 and an input/out bus104. Storage bus 103 also connects to a read only memory 106, anonvolatile random access memory (NVRAM) 108, and system memory 110.

The firmware referred to above may be stored in read only memory 106,NVRAM 108, or it may be loaded into system memory 110. Operating systemcomponents and applications are stored in system memory 110.

PCI bus 104, which has become an industry standard for personal computerinput/output interfaces, connects processor 102 to a number of user andperipheral device controllers. Display subsystem 112 connects to PCI bus104 and provides text and graphics data to monitor 114. A keyboard andmouse interface controller 116 controls inputs from keyboard 118 andmouse 120 to PCI bus 104. I/O interface 122 connects to remote devicessuch as a communications line or the like. DASD controller 124 connectsbetween PCI bus 104 and a number of direct access storage devices 126where portions of orating systems and applications and data not incurrent use by processor 102 are stored.

Referring now to FIG. 2, the storage of program information and data inNVRAM 108 and system memory 110 will be described in greater detail.

As discussed above, data may be in mixed Endian formats. Typically,these data are referred to as big Endian type and little Endian type asdescribed in the Gillig paper referenced above.

Referring now to FIG. 2, various components stored in system memory 110will be described. Hierarchial type mask encoding and transform module(HTMET) 202 is a key element of the preferred embodiment of the presentinvention. Data in big Endian type 204 and data in little Endian type206 is encoded and/or transformed by HTMET 202 to a format consistentwith the format used in the processor 102. Firmware residual data module208 provides data to the NVRAM 108, to the operating system module 210,to the program module 212 and to the image module 214. HTMET 202communicates with image module 214 and with NVRAM 108, operating systemmodule 210 and program module 212.

Type Mask Encoding

The encoding mechanisms are design for minimal storage and optimalperformance. They include:

1. Scalar Type Encode (STE): 1 to 1 mapping between data and encodingscalar types.

2. Reduction: Reducing aggregate type by replacing each aggregatecomponent with STEs.

3. Compression: Successive similar aggregates or scalars are compressedinto single count encoded STEs.

4. Iteration: Successive patterns of different aggregates or scalars areencoded as iterative patterns of compressions, reductions and STEs.

5. Type Extension: Defined types or aggregates can be maintained (notReduced) through extension of the STE encoding sequence, producing ahierarchy of type mask encodings (HTME).

6. Versioning: Various versions of HTME can be encoded in an HTME.

Scalar Type Encode

Premise: for each scalar (base) type, the sizes in bytes areimplementation specific; the size will not affect the proposal.

Premise: for notational convenience, the data or image is characterizedas a series of types in a structure STRUCT.

1. For each of the 2^(K) scalar types, assign a bit field encoding (i.e.8 types k=3).

2. For each scalar type S in STRUCT, form a K bit mask componentencoding S. Maintain a precise order from S's position in STRUCT.

Scalar Type Encode: (STE) ##STR1## Reduction

Simple Reduction:

1. For each simple aggregate (all scalars) type A, reduce A to scalarsthen STE.

Simple Reduction: (sR) ##STR2## Reduction:

    ______________________________________                                        1.  For each complex aggregate type A.sub.complex build linear ordering           as:                                                                       2.   For each simple aggregate perform a Simple Reduction                     3.    For each scalar perform an STE                                          4.    For each complex aggregate perform a Reduction                          ______________________________________                                    

Reduction: (R) ##STR3## Reduction is used when aggregate types areunique to a version. For multi-version encoding, skipping a reductionin-line can decrease storage as versions can point to out-of-linereduction.

Compression

For the set P composed of tuples (scalar_(i), scalar_(i)): iεS:={scalartypes}, compress tuple to (scalar_(i) 2). The extends for any series ofT tuples (scalar_(i1), scalar_(i2), . . . , scalar_(iT)) with(scalar_(i) T).

Compression: (C) ##STR4##

Compression encoding also enables the ability to `skip` a series ofdata. For example, in the case of CHAR, BE(CHAR)=LE(CHAR), where BE andLE are transforms on CHAR. Thus CHAR N can signal a skip of N scalars oftype CHAR, a potential transform time savings O(N).

Iteration

It is common for an aggregate data type A to be composed of patterns oftypes. If a pattern P_(a) is in A, and a similar pattern P_(b) followsin A, and if P^(encode) _(i) =Compression(Reduction(STE(P_(i)))),replace the pair (P_(a) ^(encode) P_(b) ^(encode)) with (ITERATE 2 P_(a)^(encode)). This extends for any series of T patterns to (ITERATE TP_(a) ^(encode)).

Iteration: (I)(shown with R) ##STR5## Type Extension

For the set of defined types T, it is possible to supplement the OPCODEwith new encoding by increasing the size of K to K+|T|. An OPCODE abovethose defined for Iteration are references to additional types (HTMEs).The number of additional type is 2^(k+)|T|. The Iteration COUNT stillapplies. The level of sub references is conceptually infinite but inpractice is finite and is an indication of the complexity of a program.Type extension forms the HIERARCHY in the Date Type Encoding Mask.

Type Extension: (TE) ##STR6##

The converse of type extension (out-of-line encoding) is in-lineencoding. In this case, no additional types are encoded and referencedin the HTME. All encoding results in a linear flattened series of basescalar types. There are cases where type extension is very useful.

Versioning

Referring now to FIG. 3, a block diagram showing hierarchical-type maskencoding with two versions will be described.

A first version V.x 302 in HTMET 202 has an initial entry created inHTMET 202. A second version V.y 304 is added in HTMET 202 withconnection to components of version V.x 302 for reuse of components inversion V.x 302. The versioning created in accordance with the preferredembodiment of the present invention is efficient in processor time andstorage since reuse of common components reduces processing time andadditional storage.

For each version V required of an HTME (possibly wit TE):

1. Create an initial entry into the existing HTME

2. Re-use components where applicable

3. Add components where applicable

As indicated in "Reduction of Aggregate Types", the choice of in-linereduction versus out-of-line reduction is a performance and storagetrade-off. For example consider s and t below. If |SIZE|<ε, then Reducein-line, otherwise Reduce out-of-line and add references.

In-line vs. Out-of-line Reduction

    ______________________________________                                        struct {     struct {                                                         int a;         char a, b, c;                                                  SIZE L;        SIZE L;                                                        long c;        Mem m;                                                         } s;           } t;                                                           INT 1 HTME (L) LONG 1                                                         vs.                                                                           CHAR 3 L-OPCODE 1 HTME (M) 1                                                  ______________________________________                                    

Transform

A transformation HTMET on data (data or instructions) D is:

HTMET (D)≈T (HTME (D))≈T (STE(C(I(R|sR(TE(V(D))))))) such that T is ofthe form:

    ______________________________________                                        1.  For each version V in HTMET(D)                                            2.   For each d in V                                                          3.    if d is a scalar // STE, R, or C encoding as base types                 4.      // implementation specific for size and hardware support              5.     do byte reversals                                                      6.     if d is a TE                                                           7.      locate TE offset and do T(d) at 2                                     8.     if d is an Iteration                                                   9.      for |d's encoding of ITERATE|                       10.      do T(i) : i is one of the next d's COUNT encodings                   ______________________________________                                    

The implementation specific byte re-ordering is characterized by thebase scalar types in an HTME. In the typical case, scalars are convertedas described in PowerPC 601 RISC Microprocessor User's Manual. IBMMicroelectronics. Rev 1.0. 1993., and, PowerPC Reference PlatformSpecification V1.1., IBM September 1994. The mechanisms to perform there-ordering can be a software algorithm or can have hardware assistance,such as lwbrx or lwarx. For computers where the notion of base scalartypes and Endian transformations are different, a replacement of theimplementation byte re-ordering mechanism is needed.

Referring now to FIG. 4, the components of HTMET 202 will be describedin greater detail.

HTMET 202 encodes and/or transforms data 402 to data prime 404 which isthe encoded and transformed version of data 402. Data 402 may bebuffered in buffer 406 prior to the hierarchical-type mask encoding inencoder 408. After encoding, an encoder 408, which will be furtherdescribed with reference to its component modules, the encoded data istransformed by transform module 410 as described above with the outputbeing data prime 404.

The modules which are included in type mask encoding in encoder 408 areall modules which have been described above in this specification. Thesemodules include versioning 412, type extension 414, reduction 416 whichmay include standard reduction and simple reduction, iteration 418,compression 420, and scalar-type encoding (STE) 422.

Although the present invention and its advantages have been described indetail, it should be understood that various changes, substitutions andalterations can be made herein without departing from the spirit andscope of the invention as defined by the appended claims.

What is claimed is:
 1. A data processing system, comprising:a processor,for processing instructions and data; a storage subsystem, for storinginstructions and data for use in said processor, said storage subsystemcomprising a hierarchical type mask encoding and transform module,wherein the hierarchical type mask encoding and transform module is usedfor encoding and transforming data in a first data format having ahierarchical data structure to a second encoded data format thatpreserves said hierarchical data structure for use in said processor;and an input-output subsystem for controlling interaction between saidprocessor, said storage subsystem and peripheral devices.
 2. A dataprocessing system, according to claim 1, wherein said storage subsystemfurther comprises:one or more storage areas for storing data in saidfirst format; and one or more storage areas for storing data in saidsecond format.
 3. A data processing system, according to claim 1,wherein said hierarchical type mask encoding and transform moduleincludes a plurality of transforms and wherein said storage subsystemfurther comprises:means for detecting said hierarchical data structure;and means for selectively applying zero, one, or more of said pluralityof transforms to said first data format in response to said first dataformat structure.
 4. A data processing system, according to claim 1,wherein said input-output subsystem further comprises:one or morecontrollers for controlling command and data flow between saidprocessor, said storage subsystem and said peripheral devices.